Variable-length Intervals in Homology Search

نویسندگان

  • Abhijit Chattaraj
  • Hugh E. Williams
چکیده

Fast, accurate, and scalable search techniques for homology searching of large genomic collections are becoming an increasingly important requirement large genomic collections are becoming an increasingly important requirement as genomic sequence collections continue to double in size almost yearly. Almost all homology search techniques rely on extracting fixed-length overlapping sequences from queries and database sequences, and comparing these as the first step in query evaluation; this is a feature of wellknown tools such as FastA, BLAST, and our own Cafe technique. In this paper we discuss a novel, variablelength approach to extracting subsequences that is based on homology scoring matrices. Our motivation is to achieve a balance between the speed and accuracy of fixed-length choices, that is, to encapsulate the speed of longer subsequence lengths and the accuracy of shorter ones. We show that incorporating this approach into our Cafe technique leads to a good compromise between accuracy and retrieval efficiency when searching with blosum matrices sensitive to distant evolutionary relationships. We expect the same results would be achieved with other homology search techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scheduled Review Methods for Controllable State Variables

In many real systems in which a state variable should be controlled for being in appropriate range, the length of control (review) intervals is taken to be constant. In such systems, when the cost of reviews and out-of-range values of the state variable are considerable, this method may not be optimal. In this paper we let the length of review intervals to be variable during each operating cycl...

متن کامل

On The Search Biases Of Homologuous Crossover In Linear Genetic Programming And Variable-length Genetic Algorithms

In this paper we study with a schema-theoretic approach and experiments the search biases produced by GP/GA homologous crossovers when applied to linear, variable-length representations. By specialising the schema theory for homologous crossovers we show that these operators are totally unbiased with respect to string length. Then, we provide a fixed point for the schema evolution equations whe...

متن کامل

On the Search Biases of Homologous Crossover in Linear Genetic Programming and Variable-length Genetic Algorithms

With a schema-theoretic approach and experiments we study the search biases produced by GP/GA homologous crossovers when applied to linear, variable-length representations. By specialising the schema theory for homologous crossovers we show that these operators are unbiased with respect to string length. Then, we provide a fixed point for the schema evolution equations where the population pres...

متن کامل

Expression analyses of endoglucanase gene in Penicillium oxalicum and Trichoderma viride

The expression of endoglucanase gene and protein profile belonging to two fungal species, Penicillium oxalicum 1SMS and Trichoderma viride 156MS with high cellulase enzyme activity, was investigated. Fungal isolates were cultured on inducer CMC medium and then the amount of released sugar and protein were assayed every three days for a month, using arsenate molybdatereagent and Bradford method,...

متن کامل

A Schema-Theory-Based Extension of Geiringer's Theorem for Linear GP and Varialbe-length GAs under Homologous Crossover

In this paper we study, using a schema-theoretic approach, the search biases produced by GP homologous crossovers when applied to linear representations, such as those used in linear GP or in variable length GAs. The study naturally leads to generalisations of Geiringer’s theorem and of the notion of linkage equilibrium, which, until now, were applicable only to fixed-length representations. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004